Ranking and Semi-supervised Classification on Large Scale Graphs Using Map-Reduce
نویسندگان
چکیده
Label Propagation, a standard algorithm for semi-supervised classification, suffers from scalability issues involving memory and computation when used with largescale graphs from real-world datasets. In this paper we approach Label Propagation as solution to a system of linear equations which can be implemented as a scalable parallel algorithm using the map-reduce framework. In addition to semi-supervised classification, this approach to Label Propagation allows us to adapt the algorithm to make it usable for ranking on graphs and derive the theoretical connection between Label Propagation and PageRank. We provide empirical evidence to that effect using two natural language tasks – lexical relatedness and polarity induction. The version of the Label Propagation algorithm presented here scales linearly in the size of the data with a constant main memory requirement, in contrast to the quadratic cost of both in traditional approaches.
منابع مشابه
A Semi-supervised Method for Multimodal Classification of Consumer Videos
In large databases, the lack of labeled training data leads to major difficulties in classification. Semi-supervised algorithms are employed to suppress this problem. Video databases are the epitome for such a scenario. Fortunately, graph-based methods have shown to form promising platforms for Semi-supervised video classification. Based on multimodal characteristics of video data, different fe...
متن کاملThe MultiRank Bootstrap Algorithm: Self-Supervised Political Blog Classification and Ranking Using Semi-Supervised Link Classification
We present a new semi-supervised learning algorithm for classifying political blogs in a blog network and ranking them within predicted classes. We test our algorithm on two datasets and achieve classification accuracy of 81.9% and 84.6% using only 2 seed blogs.
متن کاملA Convex Formulation for Semi-Supervised Multi-Label Feature Selection
Explosive growth of multimedia data has brought challenge of how to efficiently browse, retrieve and organize these data. Under this circumstance, different approaches have been proposed to facilitate multimedia analysis. Several semi-supervised feature selection algorithms have been proposed to exploit both labeled and unlabeled data. However, they are implemented based on graphs, such that th...
متن کاملGraph Partition Neural Networks for Semi-Supervised Classification
We present graph partition neural networks (GPNN), an extension of graph neural networks (GNNs) able to handle extremely large graphs. GPNNs alternate between locally propagating information between nodes in small subgraphs and globally propagating information between the subgraphs. To efficiently partition graphs, we experiment with several partitioning algorithms and also propose a novel vari...
متن کاملMore Is Better: Large Scale Partially-supervised Sentiment Classication
We describe a bootstrapping algorithm to learn from partially labeled data, and the results of an empirical study for using it to improve performance of sentiment classification using up to 15 million unlabeled Amazon product reviews. Our experiments cover semi-supervised learning, domain adaptation and weakly supervised learning. In some cases our methods were able to reduce test error by more...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009